Overview

Dataset info

Number of variables16
Number of observations484568
Missing cells121063 (1.6%)
Duplicate rows44983 (9.3%)
Total size in memory57.3 MiB
Average record size in memory124.0 B

Variables types

Numeric6
Categorical8
Boolean1
Date0
URL0
Text (Unique)0
Rejected1
Unsupported0

Warnings

Dataset has 44983 (9.3%) duplicate rows Warning
Country has a high cardinality: 168 distinct values Warning
Crime_Level_in_the_City_of_Employement has 14639 (3.0%) zeros Zeros
Gender has 32365 (6.7%) missing values Missing
Hair_Color has 32534 (6.7%) missing values Missing
Profession has a high cardinality: 1355 distinct values Warning
Satisfation_with_employer has 17558 (3.6%) missing values Missing
University_Degree has 37275 (7.7%) missing values Missing
Work_Experience_in_Current_Job_[years] is highly correlated with Age (ρ = 0.9689263042) Rejected
Yearly_Income_in_addition_to_Salary_(e.g._Rental_Income) has a high cardinality: 43162 distinct values Warning

Variables

Age
Numeric

Distinct count107
Unique (%)< 0.1%
Missing (%)0.0%
Missing (n)0
Infinite (%)0.0%
Infinite (n)0
Mean37.31740437
Minimum14
Maximum125
Zeros (%)0.0%
Mini histogram

Quantile statistics

Minimum14
5-th percentile16
Q124
Median35
Q348
95-th percentile67
Maximum125
Range111
Interquartile range24

Descriptive statistics

Standard deviation16.02233104
Coef of variation0.4293527728
Kurtosis0.07465315374
Mean37.31740437
MAD13.09071862
Skewness0.7035134234
Sum18082820
Variance256.7150919
Memory size3.7 MiB
Histogram
Histogram with fixed size bins (bins=50)
Histogram
Histogram with variable size bins (bins=[ 14. 24.5 30.5 33.5 37.5 ... 100.5 103.5 110.5 116.5 125. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
20 12025 2.5%
 
15 11958 2.5%
 
16 11955 2.5%
 
18 11910 2.5%
 
22 11822 2.4%
 
24 11725 2.4%
 
23 11716 2.4%
 
19 11693 2.4%
 
26 11649 2.4%
 
29 11592 2.4%
 
Other values (97) 366523 75.6%
 

Minimum 5 values

ValueCountFrequency (%) 
14 5834 1.2%
 
15 11958 2.5%
 
16 11955 2.5%
 
17 11445 2.4%
 
18 11910 2.5%
 

Maximum 5 values

ValueCountFrequency (%) 
125 1 < 0.1%
 
119 1 < 0.1%
 
118 1 < 0.1%
 
117 1 < 0.1%
 
116 3 < 0.1%
 

Body_Height_[cm]
Numeric

Distinct count168
Unique (%)< 0.1%
Missing (%)0.0%
Missing (n)0
Infinite (%)0.0%
Infinite (n)0
Mean175.1663275
Minimum82
Maximum264
Zeros (%)0.0%
Mini histogram

Quantile statistics

Minimum82
5-th percentile144
Q1160
Median174
Q3190
95-th percentile208
Maximum264
Range182
Interquartile range30

Descriptive statistics

Standard deviation19.94076119
Coef of variation0.1138390093
Kurtosis-0.3894703161
Mean175.1663275
MAD16.41884178
Skewness0.08406772763
Sum84879997
Variance397.6339567
Memory size3.7 MiB
Histogram
Histogram with fixed size bins (bins=50)
Histogram
Histogram with variable size bins (bins=[ 82. 96. 103.5 110.5 114.5 ... 240.5 244.5 247.5 254.5 264. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
173 8794 1.8%
 
169 8784 1.8%
 
166 8768 1.8%
 
172 8740 1.8%
 
165 8723 1.8%
 
163 8709 1.8%
 
164 8705 1.8%
 
170 8670 1.8%
 
167 8629 1.8%
 
168 8578 1.8%
 
Other values (158) 397468 82.0%
 

Minimum 5 values

ValueCountFrequency (%) 
82 1 < 0.1%
 
90 1 < 0.1%
 
91 1 < 0.1%
 
95 1 < 0.1%
 
97 2 < 0.1%
 

Maximum 5 values

ValueCountFrequency (%) 
264 1 < 0.1%
 
261 1 < 0.1%
 
260 1 < 0.1%
 
258 2 < 0.1%
 
257 4 < 0.1%
 

Country
Categorical

Distinct count168
Unique (%)< 0.1%
Missing (%)0.0%
Missing (n)0
Honduras
 
49423
Switzerland
 
9109
Israel
 
9064
Other values (165)
416972
ValueCountFrequency (%) 
Honduras 49423 10.2%
 
Switzerland 9109 1.9%
 
Israel 9064 1.9%
 
Austria 8967 1.9%
 
Laos 8922 1.8%
 
Togo 8912 1.8%
 
Serbia 8900 1.8%
 
Bulgaria 8810 1.8%
 
Belarus 8809 1.8%
 
Paraguay 8800 1.8%
 
Other values (158) 354852 73.2%
 
Max length24
Mean length8.372769147
Min length4
Contains charsTrue
Contains digitsFalse
Contains spacesTrue
Contains non-wordsTrue

Crime_Level_in_the_City_of_Employement
Numeric

Distinct count199
Unique (%)< 0.1%
Missing (%)0.0%
Missing (n)0
Infinite (%)0.0%
Infinite (n)0
Mean83.11135692
Minimum0
Maximum203
Zeros (%)3.0%
Mini histogram

Quantile statistics

Minimum0
5-th percentile5
Q140
Median83
Q3124
95-th percentile163
Maximum203
Range203
Interquartile range84

Descriptive statistics

Standard deviation49.86968341
Coef of variation0.600034523
Kurtosis-1.086283072
Mean83.11135692
MAD42.80688806
Skewness0.05654388845
Sum40273104
Variance2486.985323
Memory size3.7 MiB
Histogram
Histogram with fixed size bins (bins=50)
Histogram
Histogram with variable size bins (bins=[ 0. 2. 4.5 5.5 6.5 ... 190.5 194.5 198.5 202. 203. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
0 14639 3.0%
 
5 7478 1.5%
 
125 4698 1.0%
 
6 4263 0.9%
 
130 4044 0.8%
 
16 4039 0.8%
 
136 4006 0.8%
 
117 3937 0.8%
 
120 3829 0.8%
 
86 3798 0.8%
 
Other values (189) 429837 88.7%
 

Minimum 5 values

ValueCountFrequency (%) 
0 14639 3.0%
 
4 3048 0.6%
 
5 7478 1.5%
 
6 4263 0.9%
 
7 38 < 0.1%
 

Maximum 5 values

ValueCountFrequency (%) 
203 17 < 0.1%
 
201 1 < 0.1%
 
199 1 < 0.1%
 
198 35 < 0.1%
 
197 26 < 0.1%
 

Gender
Categorical

Distinct count7
Unique (%)< 0.1%
Missing (%)6.7%
Missing (n)32365
male
185309
other
117608
female
102024
Other values (3)
47262
(Missing)
 
32365
ValueCountFrequency (%) 
male 185309 38.2%
 
other 117608 24.3%
 
female 102024 21.1%
 
unknown 29168 6.0%
 
f 15031 3.1%
 
0 3063 0.6%
 
(Missing) 32365 6.7%
 
Max length7
Mean length4.665568094
Min length1
Contains charsTrue
Contains digitsTrue
Contains spacesFalse
Contains non-wordsFalse

Hair_Color
Categorical

Distinct count7
Unique (%)< 0.1%
Missing (%)6.7%
Missing (n)32534
Black
185585
Brown
117071
Blond
116942
Other values (3)
 
32436
(Missing)
 
32534
ValueCountFrequency (%) 
Black 185585 38.3%
 
Brown 117071 24.2%
 
Blond 116942 24.1%
 
Red 29357 6.1%
 
Unknown 2970 0.6%
 
0 109 < 0.1%
 
(Missing) 32534 6.7%
 
Max length7
Mean length4.755910419
Min length1
Contains charsTrue
Contains digitsTrue
Contains spacesFalse
Contains non-wordsFalse

Housing_Situation
Categorical

Distinct count6
Unique (%)< 0.1%
Missing (%)0.0%
Missing (n)0
Large Apartment
123777
Small House
123211
Medium House
112771
Other values (3)
124809
ValueCountFrequency (%) 
Large Apartment 123777 25.5%
 
Small House 123211 25.4%
 
Medium House 112771 23.3%
 
Medium Apartment 97132 20.0%
 
Large House 20175 4.2%
 
Small Apartment 7502 1.5%
 
Max length16
Mean length13.31865703
Min length11
Contains charsTrue
Contains digitsFalse
Contains spacesTrue
Contains non-wordsTrue

Profession
Categorical

Distinct count1355
Unique (%)0.3%
Missing (%)0.3%
Missing (n)1331
payment analyst
 
1093
permit records assistant
 
1086
plumbing inspector
 
1067
Other values (1351)
479991
(Missing)
 
1331
ValueCountFrequency (%) 
payment analyst 1093 0.2%
 
permit records assistant 1086 0.2%
 
plumbing inspector 1067 0.2%
 
program and policy specialist 1057 0.2%
 
pile-driver operator 1049 0.2%
 
private detective 1043 0.2%
 
placement staff nurse 1039 0.2%
 
parking enforcement officer 1033 0.2%
 
policy writer 1032 0.2%
 
police administrative aide 1031 0.2%
 
Other values (1344) 472707 97.6%
 
(Missing) 1331 0.3%
 
Max length67
Mean length20.49720576
Min length3
Contains charsTrue
Contains digitsTrue
Contains spacesTrue
Contains non-wordsTrue

Satisfation_with_employer
Categorical

Distinct count5
Unique (%)< 0.1%
Missing (%)3.6%
Missing (n)17558
Average
225806
Happy
162315
Somewhat Happy
71833
(Missing)
 
17558
ValueCountFrequency (%) 
Average 225806 46.6%
 
Happy 162315 33.5%
 
Somewhat Happy 71833 14.8%
 
Unhappy 7056 1.5%
 
(Missing) 17558 3.6%
 
Max length14
Mean length7.222814961
Min length3
Contains charsTrue
Contains digitsFalse
Contains spacesTrue
Contains non-wordsTrue

Size_of_City
Numeric

Distinct count348368
Unique (%)71.9%
Missing (%)0.0%
Missing (n)0
Infinite (%)0.0%
Infinite (n)0
Mean830799.0354
Minimum20
Maximum49988987
Zeros (%)0.0%
Mini histogram

Quantile statistics

Minimum20
5-th percentile14608.35
Q173110
Median505183
Q31186683
95-th percentile2181221
Maximum49988987
Range49988967
Interquartile range1113573

Descriptive statistics

Standard deviation2114494.34
Coef of variation2.545133359
Kurtosis282.9322695
Mean830799.0354
MAD741963.5419
Skewness15.39816337
Sum4.02578627e+11
Variance4.471086313e+12
Memory size3.7 MiB
Histogram
Histogram with fixed size bins (bins=50)
Histogram
Histogram with variable size bins (bins=[2.00000000e+01 4.40000000e+01 5.05000000e+01 1.00500000e+02 1.02000000e+02 ... 4.74889990e+07 4.74937625e+07 4.90803945e+07 4.91118075e+07 4.99889870e+07], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
62624 33 < 0.1%
 
76719 33 < 0.1%
 
273129 32 < 0.1%
 
549 22 < 0.1%
 
51672 22 < 0.1%
 
38039 22 < 0.1%
 
61525 22 < 0.1%
 
98733 22 < 0.1%
 
21939 22 < 0.1%
 
94867 21 < 0.1%
 
Other values (348358) 484317 99.9%
 

Minimum 5 values

ValueCountFrequency (%) 
20 1 < 0.1%
 
23 1 < 0.1%
 
30 1 < 0.1%
 
36 1 < 0.1%
 
38 1 < 0.1%
 

Maximum 5 values

ValueCountFrequency (%) 
49988987 1 < 0.1%
 
49929902 1 < 0.1%
 
49888283 1 < 0.1%
 
49884122 1 < 0.1%
 
49815568 1 < 0.1%
 

Total_Yearly_Income_[EUR]
Numeric

Distinct count435635
Unique (%)89.9%
Missing (%)0.0%
Missing (n)0
Infinite (%)0.0%
Infinite (n)0
Mean129937.9202
Minimum339.36
Maximum2548790.96
Zeros (%)0.0%
Mini histogram

Quantile statistics

Minimum339.36
5-th percentile14052.227
Q138169.0975
Median84016.575
Q3172344.61
95-th percentile398664.727
Maximum2548790.96
Range2548451.6
Interquartile range134175.5125

Descriptive statistics

Standard deviation138414.2955
Coef of variation1.065234038
Kurtosis12.29588089
Mean129937.9202
MAD96658.93595
Skewness2.679302693
Sum6.296375813e+10
Variance1.915851719e+10
Memory size3.7 MiB
Histogram
Histogram with fixed size bins (bins=50)
Histogram
Histogram with variable size bins (bins=[3.39360000e+02 1.36209500e+03 2.15824500e+03 3.07210000e+03 3.61384500e+03 ... 1.05149878e+06 1.20139256e+06 1.41001710e+06 1.62520507e+06 2.54879096e+06], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
30236.24 18 < 0.1%
 
48545.28 17 < 0.1%
 
49988.87 17 < 0.1%
 
10103.82 17 < 0.1%
 
16724.74 17 < 0.1%
 
230819.08 17 < 0.1%
 
15217.59 17 < 0.1%
 
24908.77 17 < 0.1%
 
58380.65 17 < 0.1%
 
265358.26 17 < 0.1%
 
Other values (435625) 484397 > 99.9%
 

Minimum 5 values

ValueCountFrequency (%) 
339.36 2 < 0.1%
 
485.3 2 < 0.1%
 
571.43 1 < 0.1%
 
636.31 1 < 0.1%
 
799.05 2 < 0.1%
 

Maximum 5 values

ValueCountFrequency (%) 
2548790.96 1 < 0.1%
 
2391657.56 1 < 0.1%
 
2332659.88 1 < 0.1%
 
2319623.71 1 < 0.1%
 
2285303.38 1 < 0.1%
 

University_Degree
Categorical

Distinct count6
Unique (%)< 0.1%
Missing (%)7.7%
Missing (n)37275
Bachelor
183615
No
115922
Master
115616
Other values (2)
 
32140
(Missing)
37275
ValueCountFrequency (%) 
Bachelor 183615 37.9%
 
No 115922 23.9%
 
Master 115616 23.9%
 
PhD 29212 6.0%
 
0 2928 0.6%
 
(Missing) 37275 7.7%
 
Max length8
Mean length5.359101303
Min length1
Contains charsTrue
Contains digitsTrue
Contains spacesFalse
Contains non-wordsFalse

Wears_Glasses
Boolean

Distinct count2
Unique (%)< 0.1%
Missing (%)0.0%
Missing (n)0
1
242566
0
242002
ValueCountFrequency (%) 
1 242566 50.1%
 
0 242002 49.9%
 

Work_Experience_in_Current_Job_[years]
Highly correlated

This variable is highly correlated with Age and should be ignored for analysis

Correlation0.9689263042

Year_of_Record
Numeric

Distinct count39
Unique (%)< 0.1%
Missing (%)0.0%
Missing (n)0
Infinite (%)0.0%
Infinite (n)0
Mean1999.858486
Minimum1981
Maximum2019
Zeros (%)0.0%
Mini histogram

Quantile statistics

Minimum1981
5-th percentile1982
Q11990
Median2000
Q32010
95-th percentile2018
Maximum2019
Range38
Interquartile range20

Descriptive statistics

Standard deviation11.32490411
Coef of variation0.005662852739
Kurtosis-1.202838462
Mean1999.858486
MAD9.807837627
Skewness-0.0001560080668
Sum969067427
Variance128.253453
Memory size3.7 MiB
Histogram
Histogram with fixed size bins (bins=39)
Histogram
Histogram with variable size bins (bins=[1981. 1981.5 2018.5 2019. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
1981 16016 3.3%
 
1989 12679 2.6%
 
1999 12634 2.6%
 
2008 12603 2.6%
 
1993 12598 2.6%
 
1982 12543 2.6%
 
2011 12496 2.6%
 
1995 12474 2.6%
 
2015 12444 2.6%
 
2009 12431 2.6%
 
Other values (29) 355650 73.4%
 

Minimum 5 values

ValueCountFrequency (%) 
1981 16016 3.3%
 
1982 12543 2.6%
 
1983 12354 2.5%
 
1984 12210 2.5%
 
1985 12407 2.6%
 

Maximum 5 values

ValueCountFrequency (%) 
2019 12332 2.5%
 
2018 12065 2.5%
 
2017 12289 2.5%
 
2016 12260 2.5%
 
2015 12444 2.6%
 

Yearly_Income_in_addition_to_Salary_(e.g._Rental_Income)
Categorical

Distinct count43162
Unique (%)8.9%
Missing (%)0.0%
Missing (n)0
0 EUR
437276
71771.15 EUR
 
17
55555.43 EUR
 
16
Other values (43159)
 
47259
ValueCountFrequency (%) 
0 EUR 437276 90.2%
 
71771.15 EUR 17 < 0.1%
 
55555.43 EUR 16 < 0.1%
 
10852.2 EUR 16 < 0.1%
 
74977.26 EUR 16 < 0.1%
 
124673.9 EUR 16 < 0.1%
 
103818.67 EUR 16 < 0.1%
 
45695.21 EUR 16 < 0.1%
 
108326.78 EUR 16 < 0.1%
 
19850.86 EUR 16 < 0.1%
 
Other values (43152) 47147 9.7%
 
Max length13
Mean length5.688006637
Min length5
Contains charsTrue
Contains digitsTrue
Contains spacesTrue
Contains non-wordsTrue

Correlations

Missing values

Sample

First rows

AgeBody_Height_[cm]CountryCrime_Level_in_the_City_of_EmployementGenderHair_ColorHousing_SituationProfessionSatisfation_with_employerSize_of_CityTotal_Yearly_Income_[EUR]University_DegreeWears_GlassesWork_Experience_in_Current_Job_[years]Year_of_RecordYearly_Income_in_addition_to_Salary_(e.g._Rental_Income)
048182Afghanistan34maleBlackMedium Houseit auditorUnhappy82357266621.98NaN018.01981.00 EUR
148182Afghanistan34maleBlackMedium Houseit auditorUnhappy82357266621.98NaN018.01981.00 EUR
218190Afghanistan25otherRedMedium Houseproduct advisorUnhappy506689102489.14Bachelor010.01981.00 EUR
318190Afghanistan25otherRedMedium Houseproduct advisorUnhappy506689102489.14Bachelor010.01981.00 EUR
434191Albania66maleBlondLarge Housetechnical investigatorNaN190497056987.96Bachelor017.01981.00 EUR
534191Albania66maleBlondLarge Housetechnical investigatorNaN190497056987.96Bachelor017.01981.00 EUR
631200Albania64otherBlackMedium Houseprobation officer traineeAverage311975848391.66No015.01981.00 EUR
731200Albania64otherBlackMedium Houseprobation officer traineeAverage311975848391.66No015.01981.00 EUR
858167Albania76maleBlondLarge Houselead designerSomewhat Happy170666461619.14Bachelor022.01981.00 EUR
958167Albania76maleBlondLarge Houselead designerSomewhat Happy170666461619.14Bachelor022.01981.00 EUR

Last rows

AgeBody_Height_[cm]CountryCrime_Level_in_the_City_of_EmployementGenderHair_ColorHousing_SituationProfessionSatisfation_with_employerSize_of_CityTotal_Yearly_Income_[EUR]University_DegreeWears_GlassesWork_Experience_in_Current_Job_[years]Year_of_RecordYearly_Income_in_addition_to_Salary_(e.g._Rental_Income)
48455820141Honduras130fBlondSmall ApartmentfarmerHappy8037227301.36Bachelor14.92019.055552.87 EUR
48455933194Honduras151maleBlackMedium ApartmentphotogrammetristHappy597563233931.75Master015.02019.00 EUR
48456020166Honduras130fBlackSmall Apartmentmechanical engineerHappy60153230608.84Bachelor010.02019.00 EUR
48456118148Honduras125fNaNMedium ApartmentombudspersonSomewhat Happy1130948144594.15Bachelor06.32019.00 EUR
48456219198Honduras127NaNBlackMedium Apartmentmarine electronics technicianHappy92505692947.49No19.02019.00 EUR
48456372174Honduras185fUnknownSmall Apartmentmeeting and convention plannerSomewhat Happy1194468139332.94Master127.02019.00 EUR
48456427168Honduras143maleBlondMedium Apartmenthomeowner mortgage service managerHappy97087189984.65Bachelor011.02019.00 EUR
48456536158Honduras155maleBlondMedium Apartmentfire inspectorHappy906066133001.85No013.02019.00 EUR
48456630155Honduras147otherBlondMedium Apartmentquality control case reviewerAverage40288308244.07Master114.02019.023502.26 EUR
48456724158Honduras138otherBrownSmall Apartmentradio repair mechanicAverage54247297970.48Bachelor112.02019.00 EUR